Gene expression module discovery using gibbs sampling.

نویسندگان

  • Chang-Jiun Wu
  • Yutao Fu
  • T M Murali
  • Simon Kasif
چکیده

Recent advances in high throughput profiling of gene expression have catalyzed an explosive growth in functional genomics aimed at the elucidation of genes that are differentially expressed in various tissue or cell types across a range of experimental conditions. These studies can lead to the identification of diagnostic genes, classification of genes into functional categories, association of genes with regulatory pathways, and clustering of genes into modules that are potentially co-regulated by a group of transcription factors. Traditional clustering methods such as hierarchical clustering or principal component analysis are difficult to deploy effectively for several of these tasks since genes rarely exhibit similar expression pattern across a wide range of conditions. Bi-clustering of gene expression data is a promising methodology for identification of gene groups that show a coherent expression profile across a subset of conditions. This methodology can be a first step towards the discovery of co-regulated and co-expressed genes or modules. Although bi-clustering (also called block clustering) was introduced in statistics in 1974 few robust and efficient solutions exist for extracting gene expression modules in microarray data. In this paper, we propose a simple but promising new approach for bi-clustering based on a Gibbs sampling paradigm. Our algorithm is implemented in the program GEMS (Gene Expression Module Sampler). GEMS has been tested on synthetic data generated to evaluate the effect of noise on the performance of the algorithm as well as on published leukemia datasets. In our preliminary studies comparing GEMS with other bi-clustering software we show that GEMS is a reliable, flexible and computationally efficient approach for bi-clustering gene expression data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An improved Gibbs sampling method for motif discovery via sequence weighting.

The discovery of motifs in DNA sequences remains a fundamental and challenging problem in computational molecular biology and regulatory genomics, although a large number of computational methods have been proposed in the past decade. Among these methods, the Gibbs sampling strategy has shown great promise and is routinely used for finding regulatory motif elements in the promoter regions of co...

متن کامل

Identification of Prognostic Genes in Her2-enriched Breast Cancer by Gene Co-Expression Net-work Analysis

Introduction: HER2-enriched subtype of breast cancer has a worse prognosis than luminal subtypes. Recently, the discovery of targeted therapies in other groups of breast cancer has increased patient survival. The aim of this study was to identify genes that affect the overall survival of this group of patients based on a systems biology approach. Methods: Gene expression data and clinical infor...

متن کامل

A hierarchical Bayesian model for flexible module discovery in three-way time-series data

MOTIVATION Detecting modules of co-ordinated activity is fundamental in the analysis of large biological studies. For two-dimensional data (e.g. genes × patients), this is often done via clustering or biclustering. More recently, studies monitoring patients over time have added another dimension. Analysis is much more challenging in this case, especially when time measurements are not synchroni...

متن کامل

Integrating quality-based clustering of microarray data with Gibbs sampling for the discovery of regulatory motifs

In microarray experiments, genes exhibiting a similar expression profile are potentially coregulated. Clustering identifies such groups of coexpressed genes, whose upstream regions can then searched for putative regulatory elements. We present two algorithms and an interactive web-based user interface that integrate cluster analysis and motif finding for the analysis of microarray data. Startin...

متن کامل

A Combined Model and a Varied Gibbs Sampling Algorithm Used for Motif Discovery

The conserved sequences in gene regulatory regions dominate gene regulation. Discovering these sequences and their functions is important in post genome era. A novel model is constructed to represent conserved motifs of DNA sequences. This model is a combination of PWM and WAM models. The advantage is the new model not only can comprise individual base frequencies in the motifs, but also can em...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Genome informatics. International Conference on Genome Informatics

دوره 15 1  شماره 

صفحات  -

تاریخ انتشار 2004